Search results for "Locality-sensitive hashing"
showing 5 items of 5 documents
Large Scale Knowledge Matching with Balanced Efficiency-Effectiveness Using LSH Forest
2017
Evolving Knowledge Ecosystems were proposed to approach the Big Data challenge, following the hypothesis that knowledge evolves in a way similar to biological systems. Therefore, the inner working of the knowledge ecosystem can be spotted from natural evolution. An evolving knowledge ecosystem consists of Knowledge Organisms, which form a representation of the knowledge, and the environment in which they reside. The environment consists of contexts, which are composed of so-called knowledge tokens. These tokens are ontological fragments extracted from information tokens, in turn, which originate from the streams of information flowing into the ecosystem. In this article we investigate the u…
Locality-Sensitive Hashing for Massive String-Based Ontology Matching
2014
This paper reports initial research results related to the use of locality-sensitive hashing (LSH) for string-based matching of big ontologies. Two ways of transforming the matching problem into a LSH problem are proposed and experimental results are reported. The performed experiments show that using LSH for ontology matching could lead to a very fast matching process. The quality of the alignment achieved in these experiments is comparable to state-of-the-art matchers, but much faster. Further research is needed to find out whether the use of different metrics or specific hardware would improve the results. peerReviewed
Locality-sensitive hashing enables signal classification in high-throughput mass spectrometry raw data at scale
2021
Mass spectrometry is an important experimental technique in the field of proteomics. However, analysis of certain mass spectrometry data faces a combination of two challenges: First, even a single experiment produces a large amount of multi-dimensional raw data and, second, signals of interest are not single peaks but patterns of peaks that span along the different dimensions. The rapidly growing amount of mass spectrometry data increases the demand for scalable solutions. Existing approaches for signal detection are usually not well suited for processing large amounts of data in parallel or rely on strong assumptions concerning the signals properties. In this study, it is shown that locali…
Balanced Large Scale Knowledge Matching Using LSH Forest
2015
Evolving Knowledge Ecosystems were proposed recently to approach the Big Data challenge, following the hypothesis that knowledge evolves in a way similar to biological systems. Therefore, the inner working of the knowledge ecosystem can be spotted from natural evolution. An evolving knowledge ecosystem consists of Knowledge Organisms, which form a representation of the knowledge, and the environment in which they reside. The environment consists of contexts, which are composed of so-called knowledge tokens. These tokens are ontological fragments extracted from information tokens, in turn, which originate from the streams of information flowing into the ecosystem. In this article we investig…
Twister Tries
2015
Many commonly used data-mining techniques utilized across research fields perform poorly when used for large data sets. Sequential agglomerative hierarchical non-overlapping clustering is one technique for which the algorithms’ scaling properties prohibit clustering of a large amount of items. Besides the unfavorable time complexity of O(n 2 ), these algorithms have a space complexity of O(n 2 ), which can be reduced to O(n) if the time complexity is allowed to rise to O(n 2 log2 n). In this paper, we propose the use of locality-sensitive hashing combined with a novel data structure called twister tries to provide an approximate clustering for average linkage. Our approach requires only lin…